Model Selection

End-to-end training

# End-to-end training

Ade20k Panoptic Eomt Large 640

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Giant 640

This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture specifically for segmentation.

Image Segmentation

The Magician is the first multi-modal large language model with free-form multi-image localization capabilities, achieving precise localization in complex multi-image scenarios and outperforming models with a scale of 70B in performance.

Transformers English

A large language model built from scratch, with fully open-source implementations including tokenizer training, model initialization, pre-training, and instruction fine-tuning

Large Language Model

Detr Resnet 50 Sku110k

This DETR model has been trained end-to-end on the SKU110K object detection dataset with the number of queries set to 400, suitable for scenarios like product shelf detection.

Object Detection

Segformer B0 Finetuned V0

An image segmentation model fine-tuned on the tontokoton/artery-ultrasound-siit dataset based on nvidia/mit-b0

Image Segmentation

EnCodec is a high-fidelity real-time neural audio codec developed by Meta AI, employing end-to-end training and supporting multiple bandwidth settings.

Audio Generation

Deformable Detr Detic

Object detection model trained on the LVIS dataset containing 1,203 categories using deformable detection transformer architecture

Object Detection

Imclasif Genres V001

This is an image classification model generated by HuggingPics, primarily used for classifying images of specific types (genres).

Image Classification

Gender Classification

An image classification model generated by HuggingPics for identifying gender (male or female) in images.

Image Classification

Yolos Small Balloon

YOLOS is an object detection model using Vision Transformer (ViT) architecture, trained with DETR loss and fine-tuned on COCO and Matterport Balloon datasets.

Object Detection

Wav2vec2 2 Bart Large No Adapter

This model is an automatic speech recognition (ASR) model trained on the LibriSpeech ASR dataset, capable of converting English speech into text.

Speech Recognition

An automatic speech recognition model trained on the LibriSpeech ASR dataset, designed to convert English speech into text.

Speech Recognition

Wav2vec2 Tiny Random Robust

A lightweight automatic speech recognition (ASR) model, based on a randomly initialized version of the Wav2Vec2 architecture, designed for robustness testing.

Speech Recognition

Transformers English

patrickvonplaten

Kan Bayashi Ljspeech Fastspeech2

This is a FastSpeech2 text-to-speech (TTS) model trained using the ESPnet framework, utilizing the LJSpeech dataset.

Speech Synthesis English

Wav2vec2 2 Bert Large No Adapter Frozen Enc

This model is a speech recognition model trained on the librispeech_asr dataset, achieving a word error rate (WER) of 2.0133 on the evaluation set.

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase